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Abstract 

The approach of N. Gay for estimating the coverage of a multivalent vaccine from anti- 
body prevalence data in certain age cohorts is improved by using computer aided elimination 
theory of variables. Hereby, Gay's usage of numerical approximation can be replaced by exact 
formulas which are surprisingly nice, too. 



1 Introduction 



(1.1) Nigel Gay JGa[ ] has estimated the coverage of MMR (measles, mumps, rubella) multi- 
valent vaccination in a fixed age cohort by the following method: 

The rates p(±, ±, ±) of being seropositive with each of the three diseases depend, via a polynomial 
system F, on the MMR coverage v, the exposition factors e^, and the rates Si of seroconversion; 
the index i = 1,2, 3 stands for measles, mumps and rubella, respectively. On the other hand, it 
is the p(±,±,±) which can be obtained from the available data. Hence, a maximum likelihood 
approach provides estimations of v, e^, and Si. 

Gay's approach leads to numerical methods of finding values v,ei,Si that minimize the distance 
between F± > ± t ±(v, ei, Si) and the measured p(±,±,±). The present paper replaces this part by 
providing exact formulas describing the inverse of the polynomial map F : 1R 7 — > 1R S . Note that 
the image of F is contained in the hyperplane |T^p(±,±,±) = 1], i.e. it is 7-dimensional like the 
source space of F. 

The final result providing our estimation of v, e,, Sj may be found in Theorem ([|.^]). 



(1.2) We make the same three assumptions used by Gay [Ga 



(1) Vaccinated children who do not seroconvert as a result of vaccination have the same proba- 
bility of being seropositive as an unvaccinated child of the same age (i.e., ei). 

(2) In a single individual, seroconversion to each vaccine component is independent. 

(3) Risk of exposure to infection is homogeneous within each age cohort and infection with each 
disease is independent. 



However, we eliminate another assumption which is silently made in |Ga| in that we do not assume 
that the seroconversion Si for the i-th disease is independent of age. 

(1.3) We would like to thank Duco van Straten for the useful discussions concerning the 

exciting mathematical pattern hidden in the MMR problem and its solution. Moreover, we are 
greatful to Nigel Gay for sending us his manuscript including the data of the ESEN (European 
Seroepidemiological Network) Project. 
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2 The MMR system 



(2.1) First, let us recall from [Ga| the involved variables and their mutual relationship. Fixing 
one of the age cohorts, we denote by 

• v the proportion of children who have received the multivalent vaccine ( "MMR coverage" ) , 

• the rate measuring the exposure to natural infection with disease i ("exposition factor"), 

• Si the proportion of children previously with no detectable antibody to disease i who acquire 
detectable antibody to disease i when vaccinated ("seroconversion"). 

The rate measuring the presence of antibodies to disease i under the condition of being vaccinated 
may be easily expressed as 

qi = ei + (1 — ej) Si with i = 1,2, 3. 

From these data it is possible to obtain information about the expected antibody prevalence in 
general. It is encoded in the 8 variables p(±,±,±) with "+" at the i-th place standing for the 
presence and "— " for the absence of antibodies to the i-th disease. Likewise, we may think about 

the sign triples as numbers between (meaning " ") and 7 (meaning "+ + +"); this allows 

the shorter description p(±, ±, ±) = p{k) = pk- The equations are 



P7 


= P{ + 


P6 


= P( + 


P5 


= P( + 


P4 


= P( + 


P3 


= P(~ 


P-2 


= P(~ 


Pi 


= P(~ 


Po 


= P(~ 



v qi i2 q_z + (1 

v qi q 2 (1 - <&) + (1 

vqi (1 - <? 2 ) 93 + (1 

v 1i (! - 92) (1 - 33) + (1 

W (1 - <7l) <?2 ?3 + (1 

v (1 - Qi) Q2 (1 - 33) + (1 

v (1 - qx) (1 - q 2 ) qa + (1 

w(l-gi)(l-«ft)(l-<te) + (1 



v) ei e 2 e 3 

v) ex e 2 (1 - e 3 ) 

v) ei (1 - e 2 ) e 3 

v) ei (1 - e 2 ) (1 - e 3 ) 

v) (1 - ei) e 2 e 3 

v) (1 - ei) e 2 (1 - e 3 ) 

v) (1 - ex) (1 - e 2 ) e 3 

w) (1 - ei) (1 - e 2 ) (1 - e 3 ) . 



Remark: In [Ga], the variables v, e^, qi, and pk carry a second index pointing to the special age 
cohort; s, does not because of Gay's assumption mentioned at the end of (0.§). 

(2.2) The previous equations express the variables pk in terms of v, e.i-,q% or, since Sj = 

(<?j ^ e i)/(l — e i)> m terms of v,ei,Si. Our goal is to describe the inverse dependencies, and we 
proceed in two steps: 

First, using elimination theory, we produce in (^.||) and (0.0) for each of the variables v,ei,qi a 
separate equation with coefficients in the polynomial ring Q\po, ■ ■ ■ ,P7]- The surprising fact will be 
that all these equations are quadratic ones. Then, as a second step, we will check in (||J5|) which of 
the 2 7 combinations actually provide a solution to our system. The results of these investigations 
are gathered in Theorem (^|[|). 

Before we start this program, we would like to introduce an easy technical trick in which we replace 
the variables pk by symbolic fractions a^/n. By doing so, it changes the above equations in the 
obvious way. For instance, the first one becomes 



CL7 



a(+,+,+) = nvqiq 2 q3 + n (1 — v) ex e 2 ez ■ 



Since this manipulation increases both the degree and the number of variables, it seemingly com- 
plicates the problem. However, using computer algebra systems, the computational time decreases 
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substantially. Moreover, another advantage of our approach is that Ylk=oPk = ^ translates into 
Xjfe=o ° fc = n ' ^ n particular, when finally applying our formulas, we may directly substitute the 
number of observed probands in each category for the corresponding variables a^. The number n 
equals the size of the cohort. 

(2.3) Let us start with eliminating n, e,-, qi to obtain an equation for the variable v which is, 
by the way, of major interest. W e wor k with the computer algebra system Singular developed 



at the University Kaiserslautern, | GPS ] 



Let R be a polynomial ring of characteristic zero with 16 variables a&, n, v, e,, qi. For the monomial 
order we have to choose a global one, e.g. dp (16). Transforming the 8 equations into an ideal 
I C R, the command "eliminate (I,n*e(l)*e (2) *e (3) *q(l) *q(2) *q(3) )" produces a quadratic 
equation 

ci(ao, • • • , ar) v 2 - ci(ao, • • • , a?) v + co(a , . . . , a 7 ) = 
with huge polynomials cj., cq of degree 6 in the variables qq, . . . ,07. 

We may also use Singular for the factorization of polynomials. Applied to the coefficient c\ as 
well as to the discriminant of our quadratic polynomial, this yields nice results. With 

fx := n = ((a + a 3 + a 5 + a 6 ) + (a 7 + a 4 + a 2 + ai)) 

h '■= (( a o + a 3 + a 5 + a 6 ) - (a 7 + a 4 + a 2 + ai)J (a a7 + a 3 a4 + a 5 a 2 + a 6 ai^ 

- 2 (a a 7 (a - a 7 ) + a 3 a 4 (a 3 - a 4 ) + a 5 a 2 (a 5 - a 2 ) + a 6 ai(a 6 - ai)J 
+ 2 ( (a 3 a 5 a 6 + ao05a 6 + aoa 3 a 6 + Ooa 3 a5) — (a 4 a 2 ai + a7<2 2 ai + a 7 a 4 ai + 07a 4 a 2 

/ 4 := ^ao a 7 + a i a 4 + a i a 2 + a i a i) + 4 (00030506 + a 7 a 4 a 2 ai^ 

— 2^aoa7a 3 a 4 + etoQ.7 asa 2 + ao<27 agai + o 3 a 4 05a 2 + a 3 a 4 a6&i + aso 2 agOi 

we obtain 

c i = A 2 /4 and ci - 4c = /f . 
In particular, the two solutions for w are 



Vl2 = if 1± ./^M = i(i± Mao,- ..or) 



2 V V ci J 2\ /i(a , . . . , a 7 )^/f±{a , . . . , 07) 



Remarks: 

(1) Note that whenever v solves the equation, then so does (1 — v). This symmetry may easily 
be seen in the original 8 equations by switching the variables and qi. 

(2) The formulas for fi, fa, and / 4 become very natural if we recall that ao, a 3 , 05, correspond 
to o(— ,—,—), a(— ,+,+), a(+,— ,+), a(+, +, — ), respectively. These variables are those 
which have an even number of plus signs. 

This fact may be illustrated by imaging the variables a(±, ±, ±) as sitting in the corners of 
a cube. Then, a ,a 3 ,a5,ag correspond to the vertices of one of the two inscribed regular 
tetrahedra. The remaining 07, a 4 , a 2 , a\ are contained in the opposite corners, respectively. 
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03 = o(-,+,+) 



("-,+) 



a = a(- 




(+.+,+) 



+,-) = 



(+,--) 



(3) It has been observed by Duco van Straten that fi equals the hyperdeterminant of the three- 
dimensional matrix A... formed by the variables a(±, ±, ±), cf. Proposition 14.1.7. in GKZ|. 
Moreover, fa is a linear combination of the derivatives of f± which follows the usual pattern, 



2/ 3 



( dh , 9f 4 | <9/ 4 | dfj _ ^a/4 9/ 4 | 9/ 4 | 9/ 



\9ao (?03 9a5 <9ag 



9a7 9a4 da,2 da\ 



Finally, we would like to note that the coefficient Co itself does split into a product of three quadrics: 

Co 



/21 /22 /23 With /21 
/22 
/23 



= (a + ai)(a 7 + a 3 ) - (ax + a 5 )(a 6 + a 2 ) 
= (ao + a 2 )(a 7 + a 5 ) - (a 4 + a 6 )(a 3 + Oi) 
= (a + ai)(a7 + a 6 ) - (02 + a3)(a5 + a 4 ) ■ 



(2.4) Now, we focus on the remaining six variables e% and Sj. Following the above recipe, we 
obtain again quadratic equations for each of them, but with much smaller coefficients. They are 
no longer of degree 6, but quadratic themselves. 

Notation: With A.,, being the three-dimensional matrix formed by the variables a(±, ±, ±), we 
derive the following ordinary (2 x 2) matrices from it: 

• A + (l) := A + „ denotes the layer consisting of the entries a(+, ., .), i.e., the right hand face of 
the cube depicted above; the remaining (left) one forms the matrix A-(l) := A-„. Similarly, 
we may define A±(2) := A.±. and A±(3) := A..±. 

• Considering the sum of the layers, we obtain As(i) := A+(i) + A_(i) for i = 1, 2, 3. 
Using this new terminology, we may recover the quadratic co-factors from the end of (|2|.|3|) as 

fa = det 



with 



1,2,3. 



Fixing a disease index i, the elimination done by Singular tells us that and qi both obey the 
same quadratic equation. It is 



(det A s (i)\ x 2 - f detA E (i) + deb A+(i) - det A_(ij\ x + (det A + (i) 



0. 
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The discriminant is the hyperdeterminant det A = fa again. Hence, the solutions for a and qi are 

92i ± v7T 



1 



= 5 1 



(ej)i,2 and (gi)i,2 
with <72i being the quadratic polynomials 

:= det ^4 + (i) — detvl_(i) = 



det Af (i) - dot A_ (i) ± Vdet+L 



det As (i) 



2 1 



-00-03 + a±a2 + 0,40,7 — a^a^ (for i = 1) 
-aoQ.5 + 0104 + 0207 — 0300 (for i = 2) 
-apag + aia7 + 0204 — 0305 (for i = 3) 



(2.5) Assuming the general case of fa 7^ 0, det A... ^ 0, and det A^(i) + 1 for each 

i = 1,2,3, we have narrowed the number of possible values for each of the variables v, ej, and 
qi down to two. It remains to check which of the 2 7 combinations survive to provide an actual 
solution of the original system (]^J3) . 

This can easily be done by considering the sum of those equations out of the original system that 
correspond to a certain face of the cube depicted in (§.0). For instance, adding up the equations 
for 07,06,05, and 04 provides 

a 7 + a e + o 5 + o 4 = fa v q x + fa (1 - v) e x . 

All variables have been eliminated except v, qi, and e\. This allows us to show that the e's must 
not equal the q's. (Assuming e\ = qi, we would obtain 07 + 06 + 05 + 04 = faei. However, 
substituting this value of e\ into the quadratic equation of (|[|1) yields 

/i 2 (/2ie 1 2 -(/ 2 i+ 5 2i)ei+det J 4 + (l)) = -/ 22 / 23j 

which is generally different from zero.) 

Now, by Remark (|.|)(1), we may assume that, w.l.o.g., v = {fi\/fa + fa)/(2fa\ffa)- Hence, with 
ei = (/21 + 321 + y/fa)/(2fai) and q 1 = (fai + 921 ± \ffa)l{2hi), the above equation multiplied 
with 4/21 y/fa becomes 

7 

4 hi ^ffa ( o fe ) = (fiy/U + fa) (hi + 521 ± ^/fa) + (fa^fa- fa) (hi + .921 + Jfa 
= 2faVfa(hi+92i) ± 2faVU- 



k=4 



In particular, since 2/21 ( J2k=4 flfc ) = fi(hi + 521) + /3, only the signs on top survive in the 
formulas of ei and q±. 

Finally, one may use Singular again for checking that these values, together with the similar ones 
for the remaining variables, indeed yield a solution of the original system. This means that we 
have shown the following 

Theorem: // fa, fa, fai f or ? = 1,2, 3, then the polynomial system of (]|.^|), with the adaption 
Pk = Ofc/n made in mW, has exactly two solutions. They are 



fa \[fa ± fa hi + 92i + \ffa hi + 921 ± \[fa -, \ 

• - — (1 = 1,2,3). 



2 fay/fa ' 2/ 2 * ' * 2/2 



// some of the above polynomials f. do vanish, then the system ([Ijjj might have infinitly many 
solutions or no solution at all. 
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3 The MMR coverage 



(3.1) If we apply the previous theory to our statistical problem of estimating the MMR 
coverage, then stands for the number of persons of a prefixed age group observed to have 
antibody status k (k = 0, . . . , 7). Thus, f\ is the size of the cohort, and this number is automatically 
positive. On the other hand, we would like to interpret the solutions v, e^, qi, and Si of the MMR 
system as estimations of the probabilities described in (§.0). In particular, they should be real 
numbers and, moreover, be contained in the interval [0, 1]. 

While in [ Ga | the latter is forced by the numerical program used to solve the system, our solutions 
may not have these properties. However, this should not be considered problematic, but a feature 
of our method. If the solutions fall out of the range making sense, this is a strong hint that the 
input data a/- are of poor quality. 

(3.2) In the following, we will formulate the conditions the input data have to fulfill for 
yielding apropriate results. Moreover, we will see that, in the statistical context, only one of the 
two solutions mentioned in Theorem (^|.^|) survives. 

Theorem: Let be the observed number of people in a fixed age group with antibody status k. 
Then, the MMR system has a good statistical solution if and only if 



fa(a) > and f 2i (a) > \/ fa(a) + |p 2 i(a) | (i = 1,2,3). 

If these conditions are satisfied, then the estimation for v, e,-,Sj is 

_ fi \ffl + h _ hi + 92i - Vfa _ ZVfi 

2 fi vjl 2 f 2i f 2 i — g-2t + V71 



(i = 1,2,3). 



Proof: Positivity of fi means that the solutions described in Theorem (|^]5|) are real. Assuming 
this, we have 

v e [o,i] fiy/fl±h > o <^> flh > fi ■ 

On the other hand, we have seen in (|||) that 

A 2 /4 = ci = (ci - 4c ) + 4c = /| + 4 f 21 f 22 f 23 . 

Hence, the condition "we [0, 1] " is equivalent to /21/22/23 > 0. 
Since Sj = (qi — ej)/(l — ej), we know that 

e h Sl e [0, 1] <S=^ < e t < qi < 1 . 

From Theorem (|^.|^) we obtain, depending on the choice of the solution, that qi — a = \fjlj hi 
for i = 1,2,3 or that qi — ej = — v/I/ hi f° r * = 1,2,3. Anyway, for qt > ej, the polynomials 
hiihtihs must have the same sign. Together with /21/22/23 > obtained above, this means 
that /21, /22, f 2 z > 0. In particular, looking at Theorem (^.[5]), only the solution with the top sign 
survives. 

Finally, it is easy to see that the conditions > and qi < 1 translate into hi ^ VJi ~ 9n an d 
hi > V7i + 92i , respectively. □ 

(3.3) Remark: If one is only interested in the MMR coverage v, then the conditions ensuring 
a meaningful result may be weakened. It follows from the proof of the previous theorem that 

/ 4 (a) >0 and f 2t (a)>0 (i = 1,2,3). 

will do. 
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4 Data 



(4.1) To illustrate our results, we have chosen some data of some country of the ESEN 

Project, [Ga|. These data have not yet been finalized as they might be changed according to a 
new standardization between the European countries. For that reason, the use of these data here 
is for illustrative purposes only. 

The input, i.e., the sampled variables a/s, may be found in the table (||.|]). The first table compares 
our estimation of i>,ei,e 2 , and e 3 by age groups (AG) with that obtained by Gay in JgH ; the 
variables pointing to his values carry a tilde. 



AG 


V 


V 


ei 


ei 


e-2 






63 


Sl 


S2 


S3 


1 


0.227 


0.227 


0.003 


0.005 


0.019 


0.019 


0.014 


0.011 


0.950 


0.861 


0.974 


2 


0.642 


0.642 


0.122 


0.144 


0.020 


0.017 


0.090 


0.090 


0.976 


0.878 


0.922 


3 


0.715 


0.710 


0.122 


0.112 


0.041 


0.046 


0.090 


0.087 


1.002 


0.912 


0.930 


4 


0.837 


0.824 


0.251 


0.279 


0.041 


0.054 


0.106 


0.219 


1.003 


0.886 


0.922 


5 


0.859 


0.863 


0.292 


0.252 


0.241 


0.227 


0.106 


0.000 


1.000 


0.886 


0.921 


6 


0.794 


0.889 


0.621 


0.427 


0.324 


0.094 


0.106 


-0.037 


0.961 


0.855 


0.830 


7 


0.645 


0.847 


0.756 


0.550 


0.502 


0.006 


0.256 


0.258 


0.949 


0.938 


0.678 


8 


0.662 


0.794 


0.764 


0.652 


0.502 


0.285 


0.411 


0.356 


0.969 


0.877 


0.798 


9 


0.576 


0.900 


0.764 


0.588 


0.665 


0.279 


0.481 


-0.007 


0.833 


0.857 


0.838 


10 


0.478 


0.940 


0.906 


0.667 


0.734 


0.049 


0.631 


0.450 


0.906 


0.892 


0.660 



The main difference between Gay's and our results can be found in the values of v, ex, ei, e$ in the 
higher age groups. 

Moreover, while Gay has assumed age independent seroconversion rates, our solutions si,S2,s 3 
do vary with age; the most striking example is the rubella seroconversion S3. The comparison of 
Gay's values with the age average of our solutions for si, s 2 , s 3 is as follows: 

Seroconversion by N. Gay: 0.989 0.880 0.910 
Average of our si, s 2 ,S3: 0.955 0.884 0.847 



(4.2) We can use the equations of (|^.|l|) to re-calculate the expected antibody prevalence out 
of the solutions obtained for v, ej, s$. In other words, for each antibody status (±,±,±) we are 
looking for the number of people that should have been observed to yield the desired result. 
Because we used an exact method, it is no surprise that our solutions give exactly back the input 
data; they fill the a/c-columns in the following table. On the other hand, using Gay's solutions, we 
obtain different values which arc contained in the afc-columns: 









• + 


- -1 




- + + 


+ - 




H h 


+ + - 


+ + + 




a 


61 




a-2 


0,2 


03 


03 


5,4 


04 


d 5 


a 5 




a 6 




a 7 


155.8 


156 


2.3 


2 


3.1 


3 


0.5 


2 


1.0 


1 


5.0 


6 


3.7 


1 


37.7 


38 


49.1 


48 


5.0 


5 


1.1 


1 


1.0 


2 


7.9 


9 


12.7 


13 


8.2 


7 


90.2 


90 


40.8 


42 


4.2 


4 


1.8 


2 


1.2 





6.9 


6 


14.6 


11 


9.8 


8 


107.6 


114 


20.1 


18 


2.5 


5 


1.0 


1 


1.2 





8.2 


8 


17.7 


18 


11.6 


9 


129.7 


133 


14.6 


17 


1.8 





4.7 


5 


1.8 





7.3 


7 


16.1 


15 


15.3 


15 


153.4 


156 


10.2 


13 


1.3 





5.0 


2 


1.2 


3 


17.9 


14 


14.8 


20 


20.7 


30 


145.9 


135 


6.9 


11 


2.4 


4 


7.0 


1 


2.7 


3 


21.9 


16 


15.1 


13 


30.3 


40 


128.7 


127 


5.0 


7 


3.5 


4 


5.0 


3 


3.8 


3 


16.5 


15 


19.1 


20 


23.2 


25 


135.9 


135 


3.4 


6 


3.1 


1 


6.7 


4 


6.5 


9 


11.1 


11 


14.4 


14 


26.7 


27 


122.1 


122 


0.9 


2 


1.5 


2 


2.4 


1 


4.2 


4 


8.5 


7 


17.1 


17 


26.1 


28 


121.2 


121 
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(4.3) 



In the following, we will discuss some of the properties of our solutions. 



(1) One should not so much worry about negative rates or rates above 1 as they appear among 
the Ci or Sj. In all those cases, the values are very close to the allowed range. 

(2) Our major concern is caused by the exposition factors e<i and e^. They seem to be very small 
in the higher age groups and, additionally, they do not increase with age. 

For the latter, however, we may use the same explanation as Gay did for the decline of his v 
in older cohorts in that the data arise from different cohorts in each age group. 

(3) As already mentioned before, we did not ad hoc assume that the seroconversions s, are age 
independent. However, as a result of our calculations, we obtained values for mumps and 
measles that did not greatly vary - and the averages are quite close to Gay's values. 

On the other hand, the seroconversion factor for rubella shows an unusual behavior in the 
higher age groups and we would be interested in an explanation for it. 

The major difference between Gay's and our approach is the following: 

Altmann: We consider each age group separately; this yields a system of 7 equations in 7 variables 
for each group, allowing exact solutions with easy formulas. 

Gay: He considers 10 age groups at once, yielding a system with 70 equations in 70 variables. 
Moreover, he creates additional restrictions by 

• assuming that the seroconversion is age independent (meaning to lose 27 variables), 

• and by forcing the exposition factors ej to increase with age (meaning to introduce additional 
inequalities). 

For the remaining system, Gay uses a numerical approach to find values for u(age), ej(age), and 
Si to fit into the system as best as possible. Exact solutions are of course out of range. 

Thus, the fact that the above problem (2) does not occur in Gay's solutions is no surprise at all. 
It was part of his method to force all these properties which are, however, biologically plausible. 
An advantage of Gay's method is that imperfect data in single age groups might be corrected by 
the better ones. 

On the other hand, our method tells which data are better or worse and gives information about 
their quality. Moreover, besides exactness, the main advantage of our approach seems to be that 
the formulas for v, e i; Si are mutually independent. Hence, even if one dislikes the results for the 
ej's or s^s, one has still an explicit formula for the MMR coverage v which works well. 
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